NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Offline Multi-task Transfer RL with Representational Penalization

Bose, Avinandan; Du, Simon S; Fazel, Maryam (April 2025, Proceedings of Machine Learning Research)
Li, Y; Mandt, S; Agrawal, S; Khan, E (Ed.)
We study the problem of representational transfer in offline Reinforcement Learning (RL), where a learner has access to episodic data from a number of source tasks collected a priori, and aims to learn a shared representation to be used in finding a good policy for a target task. Unlike in online RL where the agent interacts with the environment while learning a policy, in the offline setting there cannot be such interactions in either the source tasks or the target task; thus multi-task offline RL can suffer from incomplete coverage. We propose an algorithm to compute pointwise uncertainty measures for the learnt representation in low-rank MDPs, and establish a data-dependent upper bound for the suboptimality of the learnt policy for the target task. Our algorithm leverages the collective exploration done by source tasks to mitigate poor coverage at some points by a few tasks, thus overcoming the limitation of needing uniformly good coverage for a meaningful transfer by existing offline algorithms. We complement our theoretical results with empirical evaluation on a rich-observation MDP which requires many samples for complete coverage. Our findings illustrate the benefits of penalizing and quantifying the uncertainty in the learnt representation.
more » « less
Free, publicly-accessible full text available April 23, 2026
Settling the Sample Complexity of Online Reinforcement Learning

https://doi.org/10.1145/3733592

Zhang, Zihan; Chen, Yuxin; Lee, Jason; Du, Simon S (May 2025, Journal of the ACM)

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large-sample” regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version ofMVP(Monotonic Value Propagation), an optimistic model-based algorithm proposed by Zhang et al. [82], achieves a regret on the order of (modulo log factors)\begin{equation*} \min \big \lbrace \sqrt {SAH^3K}, \,HK \big \rbrace,\end{equation*}whereSis the number of states,Ais the number of actions,His the horizon length, andKis the total number of episodes. This regret matches the minimax lower bound for the entire range of sample sizeK≥ 1, essentially eliminating any burn-in requirement. It also translates to a PAC sample complexity (i.e., the number of episodes needed to yield ε-accuracy) of\(\frac{SAH^3}{\varepsilon ^2} \)up to log factor, which is minimax-optimal for the full ε-range. Further, we extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances. The key technical innovation lies in a novel analysis paradigm (based on a new concept called “profiles”) to decouple complicated statistical dependency across the sample trajectories — a long-standing challenge facing the analysis of online RL in the sample-starved regime.
more » « less
Free, publicly-accessible full text available May 2, 2026
Learning Optimal Tax Design in Nonatomic Congestion Games

Cui, Qiwen; Fazel, Maryam; Du, Simon S (September 2024, Advances in Neural Information Processing Systems)

In multiplayer games, self-interested behavior among the players can harm the social welfare. Tax mechanisms are a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step of learning the optimal tax that can maximize social welfare with limited feedback in congestion games. We propose a new type of feedback named equilibrium feedback, where the tax designer can only observe the Nash equilibrium after deploying a tax plan. Existing algorithms are not applicable due to the exponentially large tax function space, nonexistence of the gradient, and nonconvexity of the objective. To tackle these challenges, we design a computationally efficient algorithm that leverages several novel components: (1) a piece-wise linear tax to approximate the optimal tax; (2) extra linear terms to guarantee a strongly convex potential function; (3) an efficient subroutine to find the exploratory tax that can provide critical information about the game. The algorithm can find an \eps-optimal tax with O(\beta F^2/eps^2) sample complexity, where \beta is the smoothness of the cost function and F is the number of facilities.
more » « less
Full Text Available
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models

Xu, Weihang; Fazel, Maryam; Du, Simon S (September 2024, Advances in Neural Information Processing Systems)

We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with n > 1 components learns from data that are generated by a single ground truth Gaussian distribution. While results for the special case of 2-Gaussian mixtures are well-known, a general global convergence analysis for arbitrary n remains unresolved and faces several new technical barriers since the convergence becomes sub-linear and non-monotonic. To address these challenges, we construct a novel likelihood-based convergence analysis framework and rigorously prove that gradient EM converges globally with a sublinear rate O(1/\sqrt{t}). This is the first global convergence result for Gaussian mixtures with more than 2 components. The sublinear convergence rate is due to the algorithmic nature of learning over- parameterized GMM with gradient EM. We also identify a new emerging technical challenge for learning general over-parameterized GMM: the existence of bad local regions that can trap gradient EM for an exponential number of steps.
more » « less
Full Text Available
Optimal Multi-Distribution Learning

Zhang, Zihan; Zhan, Wenhao; Chen, Yuxin; Du, Simon S; Lee, Jason D (July 2024, Conference on Learning Theory)

Full Text Available
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Jiang, Haozhe; Cui, Qiwen; Xiong, Zhihan; Fazel, Maryam; Du, Simon S (January 2024, Proceedings of the International Conference on Learning Representations)

We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve O(∆^1/4 T^3/4) regret when the degree of nonstationarity, as measured by total variation ∆, is known, and O(∆^1/5 T^4/5) regret when ∆ is unknown, where T is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.
more » « less
Full Text Available
Provable General Function Class Representation Learning in Multitask Bandits and MDPs

Lu, Rui; Zhao Andrew; Du, Simon S.; Huang, Gao (December 2023, Advances in neural information processing systems)

Full Text Available
Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure

Yuan, Huizhuo; Li, Chris Junchi; Gidel, Gauthier; Jordan, Michael I; Gu, Quanquan; Du, Simon S (December 2023, Advances in neural information processing systems)

Full Text Available
Improved Active Multi-Task Representation Learning via Lasso

Wang, Yiping; Chen, Yifang; Jamieson, Kevin; Du, Simon S. (July 2023, International Conference on Machine Learning (ICML) 2023)

Full Text Available
Blessing of Class Diversity in Pre-training

Zhao, Yulai; Chen, Jianshu; Du, Simon S. (January 2023, International Conference on Artificial Intelligence and Statistics (AISTATS) 2023)

Full Text Available

« Prev Next »

Search for: All records